Skip to content

feat: add pushedAt field (R5/R13) — pipe GitHub pushed_at through all…#21

Merged
rainxchzed merged 4 commits into
mainfrom
fix/per-migration-transaction
May 21, 2026
Merged

feat: add pushedAt field (R5/R13) — pipe GitHub pushed_at through all…#21
rainxchzed merged 4 commits into
mainfrom
fix/per-migration-transaction

Conversation

@rainxchzed
Copy link
Copy Markdown
Member

@rainxchzed rainxchzed commented May 21, 2026

… layers

  • V18 migration: add pushed_at_gh TIMESTAMPTZ column to repos
  • Expose pushedAt in RepoResponse (distinct from updatedAt/metadata change)
  • Persist in GitHubSearchClient ingest, upsertMetadataOnly, and Meili sync
  • Map in RepoRepository, SearchRepository, MeiliRepoHit, all route mappers
  • Add POST /internal/backfill-pushed-at to fill NULL rows on existing data

rainxchzed and others added 3 commits May 21, 2026 13:58
… layers

- V18 migration: add pushed_at_gh TIMESTAMPTZ column to repos
- Expose pushedAt in RepoResponse (distinct from updatedAt/metadata change)
- Persist in GitHubSearchClient ingest, upsertMetadataOnly, and Meili sync
- Map in RepoRepository, SearchRepository, MeiliRepoHit, all route mappers
- Add POST /internal/backfill-pushed-at to fill NULL rows on existing data

Co-Authored-By: Oz <oz-agent@warp.dev>
Gone and Archived branches in runBackfill never wrote pushed_at_gh,
so those rows kept appearing in the pushed_at_gh IS NULL filter on
every subsequent invocation. Fix: call markPushedAtFallback() for
both cases, which stamps COALESCE(updated_at_gh, indexed_at) as a
proxy so they are never reconsidered. TransientFailure is
intentionally left NULL to be retried. Update endpoint comment to
accurately describe termination semantics per outcome branch.

Co-Authored-By: Oz <oz-agent@warp.dev>
…cal codes

TopicCodeMapper resolves raw GitHub topic strings to canonical codes
(ai, privacy, security, networking, messaging, browser, social, launcher,
notes, reader, audio, video, photo, backup, self-hosted). Derived at
response-construction time from existing topics field — no DB migration,
no Meili change. All 5 RepoResponse mappers updated.

Frontend can render up to 3 codes as TopicGlyph icons. Replaces the
current hardcoded 12-glyph set + alias map with backend-driven normalization
covering the actual FOSS app taxonomy in the catalog.

Co-Authored-By: Oz <oz-agent@warp.dev>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 21, 2026

Warning

Rate limit exceeded

@rainxchzed has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 57 minutes and 57 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8d56534e-9c36-4f8e-86b3-2b608272f06d

📥 Commits

Reviewing files that changed from the base of the PR and between 439f882 and 1f11baf.

📒 Files selected for processing (7)
  • src/main/kotlin/zed/rainxch/githubstore/db/RepoRepository.kt
  • src/main/kotlin/zed/rainxch/githubstore/db/SearchRepository.kt
  • src/main/kotlin/zed/rainxch/githubstore/ingest/GitHubSearchClient.kt
  • src/main/kotlin/zed/rainxch/githubstore/model/RepoResponse.kt
  • src/main/kotlin/zed/rainxch/githubstore/routes/RepoRoutes.kt
  • src/main/kotlin/zed/rainxch/githubstore/routes/SearchRoutes.kt
  • src/main/kotlin/zed/rainxch/githubstore/topics/TopicCodeMapper.kt
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/per-migration-transaction

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@rainxchzed rainxchzed merged commit 15f62a4 into main May 21, 2026
2 checks passed
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 21, 2026

Greptile Summary

This PR pipes GitHub's pushed_at field (last default-branch commit timestamp) through all backend layers — V18 migration, RepoResponse, Postgres upsert, Meilisearch ingest, all route mappers, and a new /internal/backfill-pushed-at endpoint for filling existing NULL rows. It also introduces TopicCodeMapper, a new object that resolves raw GitHub topics to 15 canonical codes, and exposes topicCodes on every RepoResponse path.

  • pushed_at_gh is added as a nullable TIMESTAMPTZ column (V18), populated during ingest and backfill, and surfaced as pushedAt in RepoResponse from both the Postgres and Meilisearch search paths.
  • TopicCodeMapper.resolve maps raw topics to a priority-ordered list of canonical codes (ai, privacy, security, etc.) computed at response time and never persisted.
  • /internal/backfill-pushed-at reuses runBackfill and the backfillRunning gate; Gone/Archived repos receive a COALESCE(updated_at_gh, indexed_at) fallback stamp so they don't resurface in future backfill queries.

Confidence Score: 3/5

Not ready to merge without confirming the Python meili_sync.py is updated; the nightly sync will otherwise silently wipe pushed_at from every Meilisearch document it touches.

The Kotlin-side wiring is well-structured and follows existing patterns, but the nightly Python meili_sync.py run uses POST (full document replacement) and is not updated in this PR. Every nightly sync will overwrite pushed_at with null for all documents, making the Meili search path always return pushedAt = null regardless of what the backfill wrote. CLAUDE.md calls this out explicitly as a required step for any new RepoResponse field. There is also a lock-leak in the new backfill endpoint (shared with the existing one) where a DB error during candidate fetching leaves backfillRunning permanently set until restart.

MeilisearchClient.kt and InternalRoutes.kt need the most attention; the former depends on an out-of-repo Python change and the latter carries the lock-leak risk.

Important Files Changed

Filename Overview
src/main/kotlin/zed/rainxch/githubstore/db/MeilisearchClient.kt Adds pushed_at: String? field to MeiliRepoHit; requires a matching update to meili_sync.py (not in this PR) to avoid the Python sync wiping the field on each run.
src/main/kotlin/zed/rainxch/githubstore/routes/InternalRoutes.kt New /backfill-pushed-at endpoint shares runBackfill and the backfillRunning gate; inherits the existing bug where a DB exception during the candidates query leaves the lock permanently set.
src/main/kotlin/zed/rainxch/githubstore/topics/TopicCodeMapper.kt New mapper from raw GitHub topics to 15 canonical codes; uses MAPPINGS.getValue which would throw at runtime if PRIORITY_ORDER and MAPPINGS ever diverge.
src/main/kotlin/zed/rainxch/githubstore/db/SearchRepository.kt Adds pushed_at_gh to the raw SQL SELECT and maps it via rs.getString(), producing a PostgreSQL-format timestamp string that differs from the ISO-8601 format returned by the Meili path.
src/main/resources/db/migration/V18__pushed_at.sql Adds nullable pushed_at_gh TIMESTAMPTZ column to the repos table; correctly additive and registered in DatabaseFactory.kt.
src/main/kotlin/zed/rainxch/githubstore/ingest/GitHubSearchClient.kt Adds pushed_at deserialization from GitHub API, persists it to Postgres with parse guard, and includes it in Meilisearch documents on ingest.
src/main/kotlin/zed/rainxch/githubstore/model/RepoResponse.kt Adds pushedAt and topicCodes fields with appropriate defaults; topicCodes correctly documented as computed-only (never stored).
src/main/kotlin/zed/rainxch/githubstore/db/RepoRepository.kt Maps pushedAtGh and topicCodes cleanly through the Exposed DSL path.

Sequence Diagram

sequenceDiagram
    participant GH as GitHub API
    participant GSC as GitHubSearchClient
    participant PG as Postgres (repos)
    participant Meili as Meilisearch
    participant SR as SearchRepository
    participant MeiliSync as meili_sync.py (Python)

    Note over GSC,PG: On-demand ingest path
    GH->>GSC: pushed_at (ISO-8601)
    GSC->>PG: upsert pushed_at_gh (OffsetDateTime)
    GSC->>Meili: "addDocuments (pushed_at = raw string)"

    Note over SR,PG: Postgres search path
    SR->>PG: SELECT pushed_at_gh
    PG-->>SR: JDBC string format

    Note over Meili: Meilisearch search path
    Meili-->>SearchRoutes: pushed_at (ISO-8601 string)

    Note over MeiliSync,Meili: Python sync (nightly)
    MeiliSync->>Meili: POST /documents (no pushed_at field)
    Meili-->>Meili: pushed_at wiped from all docs

    Note over InternalRoutes,PG: /backfill-pushed-at
    InternalRoutes->>PG: SELECT WHERE pushed_at_gh IS NULL
    InternalRoutes->>GH: refreshRepo(fullName)
    GH-->>InternalRoutes: Ok / Gone / Archived / TransientFailure
    InternalRoutes->>PG: UPDATE pushed_at_gh (real or fallback)
Loading

Comments Outside Diff (3)

  1. src/main/kotlin/zed/rainxch/githubstore/db/MeilisearchClient.kt, line 152-155 (link)

    P1 meili_sync.py not updated with pushed_at

    CLAUDE.md explicitly requires that when a new field is added to RepoResponse, the Python meili_sync.py query must also be updated — "all five need to agree or data is silently dropped." Since addDocuments() uses POST (full document replacement), the next time the Python fetcher's meili_sync.py runs it will wipe pushed_at from every document it touches, leaving Meili-served search results with pushedAt = null even after the Kotlin backfill has populated the field. There is no mention of a coordinated meili_sync.py update in this PR or its description.

    Context Used: CLAUDE.md (source)

    Fix in Claude Code

  2. src/main/kotlin/zed/rainxch/githubstore/routes/InternalRoutes.kt, line 188-200 (link)

    P1 backfillRunning lock permanently stuck if candidates query throws

    After compareAndSet(false, true) claims the lock, if the transaction { } block throws (e.g., on a transient DB connection error), execution never reaches the candidates.isEmpty() reset or the finally in backfillScope.launch. The lock stays true for the lifetime of the process, making every subsequent call return 409 until the server restarts. The same pre-existing issue is in /backfill-stale; wrapping the candidate fetch + early-exit in a try/finally that resets the flag on exception would fix both.

    Fix in Claude Code

  3. src/main/kotlin/zed/rainxch/githubstore/db/SearchRepository.kt, line 120 (link)

    P2 pushedAt timestamp format differs by search path

    rs.getString("pushed_at_gh") returns the PostgreSQL JDBC string representation of a TIMESTAMPTZ (e.g., 2023-01-15 12:34:56.0+00:00), while MeiliRepoHit.pushed_at carries the raw GitHub ISO-8601 string (e.g., 2023-01-15T12:34:56Z). The client receives different timestamp formats depending on whether the Postgres or the Meili path served the request. The same inconsistency already exists for updatedAt/createdAt, but since pushedAt is explicitly intended for the Heartbeat animation, this divergence is more likely to cause a client-side parse failure.

    Fix in Claude Code

Fix All in Claude Code

Reviews (1): Last reviewed commit: "Merge branch 'main' into fix/per-migrati..." | Re-trigger Greptile

Comment on lines +23 to +25
return PRIORITY_ORDER.filter { code ->
MAPPINGS.getValue(code).any { it in lower }
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 MAPPINGS.getValue(code) throws NoSuchElementException at runtime if a code appears in PRIORITY_ORDER but not in MAPPINGS. Currently both lists are in sync, but the failure mode is silent corruption of all topicCodes responses rather than a compile-time or unit-test error. Using get with a null check makes the contract more resilient to future edits.

Suggested change
return PRIORITY_ORDER.filter { code ->
MAPPINGS.getValue(code).any { it in lower }
}
return PRIORITY_ORDER.filter { code ->
MAPPINGS[code]?.any { it in lower } == true
}

Fix in Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant